Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support resumability of DynamoDB migrations #190

Merged
merged 3 commits into from
Aug 29, 2024

Conversation

julienrf
Copy link
Collaborator

@julienrf julienrf commented Aug 1, 2024

  • Track in savepoint files the scan segments that have been fully migrated
  • Generalize the savepoints management to support both CQL and DynamoDB
  • Update documentation accordingly
  • Tested the changes with the following scenario:
    1. create a source table with 300 items containing random data
    2. partially migrate some items (using scanSegments: 3 and skipSegments: [1, 2])
    3. validate that some items have been migrated
    4. migrate the remaining items (using scanSegments: 3 and skipSegments: [0])
    5. validate the full migration using our validator
      The limitations are that we don’t test that the savepoints are correctly created. I am not sure how to properly test that since this is time sensitive.

Fixes #165

Blocked by #183 and scylladb/emr-dynamodb-connector#7.

@julienrf julienrf force-pushed the dynamodb-resumability branch 2 times, most recently from a411f60 to 95c9e34 Compare August 8, 2024 13:53
@julienrf
Copy link
Collaborator Author

julienrf commented Aug 8, 2024

I guess this feature requires more thorough testing. Ideally, we should have a dedicated page on the documentation to describe precisely what the users would see and what they should do. And also to explain clearly the limitations of the current implementation.
To achieve this, we need a dataset of several GB so that the migration takes more than a few seconds and we can really interrupt it abruptly in the middle and then resume it.

@julienrf
Copy link
Collaborator Author

julienrf commented Aug 11, 2024

I performed another test this morning with the following scenario:

  1. Create a DynamoDB table with 1,000,000 random items
  2. Start a migration to ScyllaDB Alternator with 10 scan segments, and a savepoints schedule of 20s
  3. Observe the following lines in the logs:
    24/08/11 08:06:20 INFO DynamoDbSavepointsManager: Created a savepoint config at /app/savepoints/savepoint_1723363580.yaml due to schedule. Segments to skip: Set()
    24/08/11 08:06:39 INFO DynamoDbSavepointsManager: Marked segments [1] as migrated.
    24/08/11 08:06:39 INFO DynamoDbSavepointsManager: Marked segments [0] as migrated.
    24/08/11 08:06:40 INFO DynamoDbSavepointsManager: Created a savepoint config at /app/savepoints/savepoint_1723363600.yaml due to schedule. Segments to skip: Set(0, 1)
    
    These logs mean that the accumulator is correctly updated when segments have been completely migrated, and the content of the accumulator is correctly dumped into the savepoint files.
  4. Kill the Spark job before the migration completed
  5. Run the validator to check that the migration was indeed incomplete
  6. Start another migration using the latest savepoint file as configuration
  7. Observe that the segments that were marked as migrated were correctly skipped for the second migration
  8. After the migration completed, run the validator to check that all the items are present

I then performed the following variation:

  • instead of killing the migration job, I stopped the ScyllaDB service. In that case, the migration job continuously retries to write the data until the ScyllaDB service is up again. I could observe that the migration was not progressing anymore. Then I also killed the migration job, and restarted it from the latest savepoint file.

@julienrf julienrf force-pushed the dynamodb-resumability branch from 95c9e34 to feeb7b5 Compare August 11, 2024 09:11
@julienrf julienrf force-pushed the dynamodb-resumability branch from 813a2ea to b1c4589 Compare August 22, 2024 15:34
@julienrf julienrf requested a review from tarzanek August 22, 2024 15:35
@guy9
Copy link
Collaborator

guy9 commented Aug 25, 2024

@tarzanek ping

@julienrf
Copy link
Collaborator Author

julienrf commented Aug 28, 2024

All the PRs that this PR depends on have been merged. We now need to cut a new release of the emr-dynamodb-connector to unblock this PR. I will work on that ASAP.

@guy9
Copy link
Collaborator

guy9 commented Aug 28, 2024

Great, thanks!

- Track in savepoint files the scan segments that have been fully migrated
- Generalize the savepoints management to support both CQL and DynamoDB
- Update documentation accordingly

Fixes scylladb#165
@julienrf julienrf force-pushed the dynamodb-resumability branch from b1c4589 to 7d006a6 Compare August 28, 2024 11:04
@julienrf julienrf marked this pull request as ready for review August 28, 2024 11:04
Copy link
Contributor

@tarzanek tarzanek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
I like the manager superclass, this way it will be easy to fix anything for both cases

@tarzanek tarzanek merged commit 311931a into scylladb:master Aug 29, 2024
3 checks passed
@julienrf julienrf deleted the dynamodb-resumability branch August 29, 2024 16:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Savepoints are not supported for Alternator/DynamoDB
3 participants